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Abstract 

A general concept for the representation of multimedia data by unformatted and formatted data 
is introduced. It leads to a basic-fimction approach to the design and development of multimedia 
database systems, which extends a relational database management system with new attribute 
types. In this paper, raster (or bitmap) images are used as an example. The structure of image 
values is defined, and a basic set of operations for access and manipulation is proposed. These 
operations can be integrated into a query language like SQL. To facilitate a contents-oriented 
search on multimedia data in general and on images in particular, text descriptions are intro- 
duced into the database that allow users to indicate the contents of an image. The well esta- 
blished techniques of information retrieval can be applied to search for these descriptions. The 
proposed system allows to model images that are assigned to objects as well as stand-alone 
images. The paper finally sketches a prototype implementation on top of an existing relational 
database management system (Ingres). 
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1. Introduction 

As database applications become more and more diversified, the capabilities of the current 
commercial database management systems (DBMS) developed on the basis of handling format- 
ted data become less and less satisfactory. In many of the newer applications, handling of mul- 
timedia data such as text, graphics, images, voices, sound, and signal data is important and must 
be dealt with. Such are the cases of managing engineering and office data. However, storing data 
of this kind is one thing; organizing a large amount of them for efficient search and retrieval is 
quite another [LWH87]. Research to develop multimedia DBMS has been initiated few years 
ago [Ma87, Ch86, Gi87, WKL86]. Some prototypes have been implemented. 

Unfortunately, because of the complexity in managing multimedia data, there are not gen- 
erally accepted solutions at this time. In fact, it can be said that there is not yet a a good general 
solution. Most projects adopted the approach of developing a specialized system for a special 
application to reduce complexity (e.g. office environment or engineering environment). While 
this is definitely one approach we can try to solve our problems, one can also take a different 
direction as well. 

The approach in this paper illustrates an alternative in finding a solution. Its approach is to 
develop a basic functional DBMS that can handle multimedia for any application, analogous to 
the way how one construct a normal DBMS for handling formatted data. That is to say, we shall 
concentrate on developing a DBMS with the basic functions for retrieving, searching, and 
managing multimedia data as we do in handling formatted data. Although there is the opinion 
that such a DBMS should be object-oriented, we think that we should start with a simple and 
well-established data model, i.e. the relational model, and concentrate on the multimedia data. 
However, in order for us to be successful with this approach, it is necessary for us to find a way 
to reduce the complexity of handling multimedia data. Thus, first we shall discuss a little on the 
complexity issue of multimedia data handling. 

The fundamental difficulty in handling multimedia lies in the problem of handling the rich 
semantics that is contained in the multimedia data. In traditional DBMS, data is always format- 
ted. The semantics that can be associated with the formatted data is very restrictive. For exam- 
ple, if the attribute is age with the unit to be year, then a storage of 34 in the data for this attri- 
bute can mean only 34 years of age, and nothing more. Further semantics in the interpretation of 
the data can be done, but would be at a different level. This, in fact, gives rise to the research in 
semantic data modeling, which after many years of research is still in its infantile stage. This 
problem is difficult and complex. No pat solution is expected in the near future. 

Unfortunately multimedia data is intrinsically tied to a very rich semantics. Consequently, a 
simple extension from formatted data into textual data, for example, already brings us much 
difficulty. Information retrieval scientists have spent a number of years trying to solve this 
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problem with some good success. Extending into other kind of media such as image is much 
more difficult. To illustrate such a difficulty, one only need to look at a simple image of ships. 
Given such a picture, how are we to know what kind of ships are there? Are they destroyers? 
cmisers? aircraft carriers? passenger ships? freighters? oil tankers? or whatever? Or. if we are 
given a picture of a dog and a cat, both running, how are we to know if the dog is chasing the cat 
or vice versa? Or are they simply playing with each other? 

To answer queries posed on images, a person must draw from a very rich experience one 
has encountered in life. Further, the person must also perform integration, analysis, synthesis, 
and even extrapolation of his or her knowledge to derive a good answer. One must have a very 
sophisticated technique to analyze the content of the images to get the semantics of many, many 
different things. This kind of capability is generally referred to as intelligence. As a result, per- 
sons with limited experience and knowledge, such as a child or some who has not been exposed 
to the various kind of ships, will not be able to give good answers to queries on multimedia data. 

To expect systems to have this kind of capability to answer multimedia query is definitely 
not possible in today’s systems. Technology has not been developed to this level thus far. 
Hence, we cannot develop a DBMS to be able to handle the multimedia data to the same extent 
we know how to handle formatted data. 

We can, however, do the next best thing. As die proverb says, "a picture is worth ten 
thousand words". This means that we can describe a picture or an image by ten thousand words, 
although one would never have exactly the same thing, feeling- or meaning-wise. Ten thousand 
words, more or less, is not so important. What is important is that we can abstract the content of 
the image data, sound data, or other forms into words or text. Once we have the text description, 
we can say that we have the "equivalent" of the original multimedia data, at least for searching 
and analysis purposes. We can then use the techniques developed in information retrieval and 
the formatted data to process these multimedia data since we know how to handle these kind of 
data fairly well. This is the principle we shall use in developing a DBMS to handle multimedia 
data for different applications. 

The basic concept is that, for each piece of multimedia, it will be represented by three parts: 
registration data, description data, and raw data. Raw data is a bit string of the data. For exam- 
ple, in image data, it can be the bitmap of the image. Registration data is the data related to the 
physical aspect of the raw data for the device to display the raw data. For example, it uicludes 
the color intensity and the colormap for an image. Description data relates to the content of the 
multimedia data entered by the users. It is in the form of natural language description. For exam- 
ple, the image may contain "a battleship docked at the San Diego harbor". This part of the data 
will be used for content search for multimedia data in the system. 
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As far we the authors know, the use of such technique to represent multimedia data has not 
been proposed before, although registration data and raw data have been used. It is the definition 
and integration of the description data that allow us to do the complicated and complex content 
search of multimedia data that has been elusive to this date. By using the techniques of database 
and uiformation retrieval disciplines, we wUl be able to handle multimedia data in similar ways 
as one does in handling formatted data. We can extend the relational structure and the query 
interface to allow us to construct a broadly capable multimedia database system for various 
applications. Operations for such a system will be described. However, the internal structure of 
the system goes beyond the scope of this paper and wUl not be discussed. Readers of this paper 
should have no problem to see that there are many alternatives for the internal structure. 

In section 2 we introduce a general concept of multimedia data management that can be 
supported by such a DBMS. Section 3 concentrates on images that are used as a representative 
type of multimedia data during prototype development. This makes it necessary to review image 
databases briefly. In section 4 three different relation schemas for the modelling of images and 
their related data are discussed, and the details of the attribute type image are presented. Section 
5 finally sketches the architecture of the prototype being developed. 



2. Data Organization for Multimedia 

Multimedia data are also referred to as unformatted data. More precisely this means that 
their values consist of a variable-length list of many small items the meaning of which is not 
associated with database processing: characters in the case of text, pixels in images, line seg- 
ments and areas in graphics, and so on. There are usually higher-level structures as well (sen- 
tences, paragraphs, 2D objects, scenes), but again they may not be known to the DBMS when 
the data are stored. Invariably, multimedia data are accompanied by some standard formatted 
data called registration data. For text this could be something like document number, name and 
affiliation of the author, the wordprocessor used etc. For images it could be resolution, pixel 
depth, source, date of capture, and colormap. The important issue of the registration data is that 
they are required if anything is to be done with the multimedia data at all, either to interpret 
them for replay or display, or to identify them and distinguish them from others. Registration 
data can easily be stored in the attributes and tuples of standard relational database systems, thus 
making the full power of query languages available to retrieve and manipulate them. 

While the registration is indispensable, other formatted (or unformatted) data describing the 
contents of multimedia data generally are not on hand. This so-called description data is per se 
redundant, because it repeats information already present in the image, text, or sound. However, 
because of the complexity and the depth of its information content, there is hardly any chance to 
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perform efficiently a contents-oriented search on the unformatted raw data themselves. It is 
much easier to use the description that is often stmctured as formatted data, so that the power of 
a query language can be applied, as suggested in the introduction section of this pap>er. It is very 
difficult and time-consuming to derive the description automatically (this is called /cart/rc and 
content extraction), although the areas of natural language understanding, unage analysis, and 
pattern recognition have developed a number of techniques and algorithms. With these tech- 
niques. we have limited success in feature extraction. But we are nowhere near the success of 
achieving automatic information content extraction. As mentioned in the introduction, such kind 
of work requires much too much intelligence in a system than we know how to provide today. 
Thus, it is much easier and more effective to let a human user provide the description, just as an 
author provides abstract and keywords with an article. In either case the database should hold 
the result of the extraction, i.e. the description, and link it to the multimedia data. It is the pur- 
pose of a multimedia database system to provide long-term storage for the multimedia data as 
well as their description. 

The description can be fairly rich and complicated, due to the amount of information embo- 
died in an image or a signal. New modelling tools like semantic and object-oriented data models 
or knowledge representation methods could help to organize them, but are still in an experimen- 
tal stage. None of the many different proposals has proven to be clearly superior over the others. 
In contrast, the relational model is well established now and has a significant modelling potential 
that should be exploited. In cases where it does not suffice, attachment of plain text to mul- 
timedia data offers great improvement at limited cost. It can be entered by users without special 
skills, and it can be used to search for multimedia data: All the weU-known techniques of infor- 
mation retrieval can be applied [Sh64, LF73, SM83]. In doing so, one type of multimedia data 
(e.g. image) is in fact described with the help of another type of multimedia data (text) that is 
easier to handle. This is not unusual: graphics can be used to describe aspects of an image, and 
voice can partly be represented by text. However, it should be noted that this is almost always 
accompanied by a loss of information. 

Multimedia data, their registrations and their descriptions can be used in various ways, as 
sketched in fig. 1. Any access to the raw data must go "through" the registration data to make 
sure that the raw data are interpreted correctly. Editing operations on the raw data including 
filtering, clipping, bitmap operations for images, stripping of layout commands and control char- 
acters for text, etc. are permitted. Special operators that are applied to the description data can 
be distance and volume calculations on geometric data [CF80], or the addition of synonyms in 
the case of keywords. These operators can actually do a lot of processing without ever touching 
the raw data. In fact, it is expected that most of the processing, except the editing of the raw 
data, wUl be done outside the raw data. Some of these operators cannot be implemented with 



- 6 - 




Figure I : Groups of Operations on Multimedia Data and on the Associated Formatted Data 

commands of the query language only. They need the features of a general-purpose program- 
ming language. New data models will allow them to be incorporated into the database as "pro- 
cedures” or "methods". 

To make the following discussion more explicit, we shall concentrate on images as a 
representative type of multimedia data. This allows us to define registrations, descriptions, and 
operations in detail. We plan to do similar things for the other types of multimedia data as well. 



3. Image Database Systems 

There is quite a tradition of database support for image management and image analysis 
[CK81, TY84]. Some of the approaches concentrate on the description data, whUe others 
address the raw data and registration data first. None has been found to address raw data, regis- 
tration data, and description data in any thorough fashion. 
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Raw image data consist of a matrix of pixels (picture elements). Each pixel indicates the 
color or greyness of a small (atomic) portion of the image. It can be encoded by a single bit to 
indicate black or white. Alternatively, several bits can be used to encode a pixel, e.g. 8 or 24. 
The number of bits per pixel is called the pixel depth. As the size of the image (the number of 
pixels in rows and columns) as well as the depth can vary, the raw data appear just as a string of 
bits that cati only be interpreted if the size and the depth are known. Hence, size (also called 
resolution) and depth are first examples of registration data. 

Pixels may either define a color/greyness value directly or index a so-called colormap. A 
typical colormap contains 256 entries each of which specifies the particular intensities of the 
three basic colors red, green, and blue, or defines a certain color in another way. To display an 
image on a particular device, special storage segments or registers assigned to that device must 
be loaded with the colormap. The colormap can have a variable lengdi, thus it is debatable 
whether it belongs to the raw data or to the registration data. Because it is needed to interpret 
and reproduce the image and because its size is rather limited, we classify it as registration data. 

If the pixels consist of 8 bits each, up to 256 colors can be used in that image. If there are 
24 bits per pixel, each 8-bit portion addresses a different entry in the colormap: The first one is 
used to obtain the intensity of red only, the second and third are used for green and blue respec- 
tively. Thus 2^ colors can be used in the image. 

The use of a colormap primarily saves storage. Instead of repeating the definition of a color 
in thousands of pixels, it is done only once in the table entry where it can occupy several bytes. 
However, this indirection has more advantages: Instead of using the basic colors red, green, and 
blue (RGB) the encoding in the colormap could as well be done in terms of "intensity, hue, and 
saturation" (IHS) or the "YIQ" defined by the National Television Systems Committee. This can 
be required for the output on certain types of monitors. Formulae are available to calculate one 
color definition from the other [Ni86, BB82]. The translation is restricted to the 256 entries of 
the colormap and does not touch the 10000 or more pixels of the image. Finally, modifying the 
colors of an image can be used to highlight minimal color changes and thus to make visible hid- 
den shapes on an image, or to perform some simple animations. 

Some image identification should also be part of the registration data to be able to distin- 
guish images properly. Depending on the application this could be merely an arbitrary number, 
a combination of source (camera, satellite) and time, or other similar schemes. 

How are raw data and registration data integrated into a database system? Some systems 
simply put them in files, e.g. HIDES [TM77, TaSOb] and EMDB [LU77, LH80]. This means that 
they do not offer a data model, but only a set of operations (subroutines) to access and manipu- 
late the image files. Others have moved the registration data to a relational database system and 
linked them to the raw data in the files, e.g. REDI/IMAID [CF79, CF81] and GRAIN [CRM77, 
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LC80]. They use special relations in which each tuple stands for one image. Display and edit- 
ing operations can be applied to the tuples of these relations. However, as G.Y. Tang pointed 
out in [Ta80a], it is not clear what the semantics of the standard relational operators should be 
when they are applied to those image relations. Especially when two image tuples are joined, 
which of the images is represented by the result? Botli of them? 

For this reason, Tang proposed that the raw data should be conceptually represented in the 
data model as attribute values. This does not imply anything for the storage structures. Inter- 
nally, images can still be kept in separate files, but they are now accessible through the query 
language. The display and editing operators are applied to the attribute, not to the tuple. Joining 
two tuples with image attributes yields a tuple with more than one image attributes which can be 
handled easily. 

Tang himself and Grosky [Gr84] have designed data models based on this approach, but 
neither of them has reported a successful implementation. The IBM Tokyo Scientific Center has 
in fact implemented a system called ADM (Aggregate Data Manager) that is based on System R 
and uses SQL as a query language [TU79]. Some of the registration data are handled in the fonn 
of type information, i.e. there are different domains used for binary and grey-tone images. Using 
SQL queries, images can be retrieved as attributes in relations and tuples and can then be moved 
to a workspace, where a variety of editing operations can be applied to them. The resulting 
image can be reinserted into the database. Unfortunately, the program interface is not explained 
in the paper; it is expected to be some modification of the SQL embedding. However, this 
approach seems more appropriate than that of [TaSOa]. We shall adopt the ADM concept as a 
starting point for our system and develop it to more detail. The authors of the ADM model 
themselves have suggested the extension of their system to other types of multimedia data 
[TII79], but we could not find out whether they have actually pursued that goal. 

Other image DBMS like EMAID and GRAIN have put much more emphasis on the image 
description data. They are stored in relations with a special structure (e.g. attributes holding 
geometric coordinates) that can be used as input to pictorial operators. It should be noted that 
this always implies a slight restriction towards a specific domain, in this case Landsat photo- 
graphs. Lines detected almost immediately resemble objects like highways, rivers, or city boun- 
daries. This is different from analyzing arbitrary photographs of three-dimensional objects, 
where it is much harder to relate a line to an object. Hence, we propose to buUt applications like 
that on top of a database system and use it to hold the images as well as the descriptions. 
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4. Extending the Relational Model with the Data Type Image 

In this section we shall discuss the data type IMAGE in more detail. We begin with a look 
at some modelling issues of assigning images to objects and vice versa, which have not been 
addressed by the papers cited in the last section. 



4.1. The Relationship of Objects and Images 

IMAGE is a new attribute domain, i.e. an image is supposed to be an attribute of some 
object or entity (a ship or an aircraft, for instance). Usually it is an attribute of the object shown 
on the picture, but that need not be the case. Making Image an attribute does not prevent the 
treatment of pictures as stand-alone objects (see relation schema type 3 below). The sunplest 
way of assigning an image to an object leads to a relation schema like this: 

OBJECT ( O-ID , ... , O-IMAGE) 

OBJECT is the name of the relation such as SHIP, CAR, or PERSON, followed by a list of attri- 
butes. The object identifier O-ID is underlined to indicate that it is the primary key. We denote 
this as the relation schema type 1. Its advantage is that access to the tuple describing an object 
fetches the image, too. More than one attributes of type IMAGE can be defined for a relation. 
However, it may often be die case that the number of images per object varies. If first nonnal 
form is required, such repeating groups can only be modelled by a separate relation. Hence, there 
is a relation schema type 2: 

OBJECT ( O-ID , ... ) 

OBJECT-IMAGE ( O-ID. O-IMAGE ) 

In the relation OBJECT-IMAGE the O-ID alone cannot serve as a key, because there may be 
several images of one object, leading to several tuples with the same O-ID. Thus O-IMAGE has 
to be included to make the key unique. The fact that an attribute of type IMAGE is part of the 
primary key might lead to severe implementation problems, but we do not consider them here 
(introducing an image identifier can help). Access to an image is not as simple as it was with 
schema type 1, for a natural or outer join is required. If the tuple of the object is available, a 
selection on the OBJECT-IMAGE relation must be performed, using the given object identifier. 

Another problem with the two approaches discussed so far is that a picture showing several 
objects must be stored redundantly, i.e. the same image is repeated in the relation for the number 
of different objects "having" (shown on) this image. The database system treats the copies as 
different images. To avoid this, a relation schema type 3 has to be used: 

OBJECT ( O-ID . ... ) 

IMAGE-OBJECT ( ITD , I-IMAGE) 
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IS-SHOWN-ON ( O-ID. I-ID , COORDINATES, ... ) 

The COORDINATES can be used to give the approximate position of the object on the image. 
Please note that we do not distinguish the statement "object x has an image y" from "object x is 
shown on image y", but represent both by the same modeling concept. Now it becomes even 
more complicated to find the Images of an object: 

NATJOIN (SELECT o-ID=objectl (IS-SHOWN-ON), IMAGE-OBJECT) 

NATJOIN stands for the natural join of two relations, i.e. the equi-join on the attributes with the 
sajne name (IS-SHOWN-ON.I-ID = IMAGE-OBJECT.I-ID). Each image is stored only once, 
regardless of how many objects it shows. It is possible now to start with an image and to retrieve 
the depicted objects: 

NATJOIN (OBJECT, SELECT (IS-SHOWN-ON)) 

One could even define a window on the image, use it to restrict the coordinates, and thus retrieve 
only the objects shown in the window. Hence, the third type of relation schema is a little bit 
unwieldy, but it provides the highest degree of freedom in modelling and processing (even 
images with unknown contents can be stored). 

The three schema types are depicted in fig. 2. The dotted line indicates a primary-key- 
foreign-key relationship (one-to-many). A relational database system extended by image attri- 
butes supports all of them. The choice depends on the application. If there is at most one image 
per object and each image shows only one object (e.g. a database of employees), then type 1 is 
most appropriate. 

There is one problem with schema type 3 that has not been mentioned yet: There may be 
different types of objects, e.g. ships, aircrafts, and submarines, each represented by a different 
relation. In this case different IS-SHOWN-ON relations are needed as well, for the domain of the 
O-ID part of the key cannot be the union of the domains of all the object identifiers. This makes 
the path from a picture to the shown objects really awkward. The introduction of a generaliza- 
tion hierarchy with a superclass ’object’ is a solution, but that goes beyond the relational model. 



4.2. The IMAGE Data Type 

As indicated earlier, not all the operations of the relational algebra can be performed 
directly on the data type IMAGE. They treat an IMAGE value as a whole, i.e. projection either 
drops it completely or keeps it in the result. The comparisons needed in selections and joins can- 
not be performed on the whole image. Even the definition of equality is rather complex for 
images, whereas it is easy to see what "pixel depth = 8" means. Hence, IMAGE should be 
regarded as an abstract data type with its own set of operators or functions, some of which map 
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Figure 2: The Three Relation Schema Types for Storing Images 

the complex domain IMAGE to standard domains like number or string. The result of these 
functions can be used in selections and joins without problems. To identify the functions, we 
have to take a closer look at the structure of an IMAGE value. It will have the three parts intro- 
duced before, namely raw data, registration, and description. Raw data and the registration are 
intrinsically tied together, so they will both be covered in the next subsection, while description 
data are discussed separately after that. 



4.2.1. Raw Data and Registration Data 

The registration data could be stored in normal attributes next to the IMAGE attribute, but 
then it would be the user’s responsibility to define them, and the display of an image could be 
impossible, if the user forgot some of them. Hence, to make sure that they are available for 
every IMAGE attribute, those required to interpret the pixel matrix are made part of the IMAGE 
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value, as shown in fig. 3. They can be seen as internal or hidden attributes, and they are accessed 
through operators of the IMAGE data type. This is ahnost as easy as the access to the other attri- 
butes. The registration data identifying an image are application-dependent and thus are kept in 
ordinary attributes (compare the O-ID and the I-ID in the schema examples of the last section). 
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Figure 3 : Conceptual View of an Instance or Value of the Abstract Data Type IMAGE 



The internal attribute named "encoding" specifies the way the colors are defined in the 
colormap, or in the pixels, if no colormap is available. Possible values may be "RGB" or "IHS", 
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but it must also indicate how the values of the three components are encoded, i.e. integer or real, 
and how many bits they use (8, 24, 32). This, of course, must be consistent with the depth of the 
colormap or the depth of the pixels. If a colormap is used, its size must further be consistent with 
the depth of the pixels. However, the depth may sometimes be set to 8 bits, although less than 
256 colors are used, in which case the pixel values have to be consistent with the size of the 
colormap. 

To read an attribute of type IMAGE from the database into a program, one could use a very 
complex, variable-length record structure in the program. It seems more convenient to make the 
components of an IMAGE value accessible only through functions. This has the additional 
advantage that the program is even more independent of the storage structures and data encod- 
ings used by the DBMS. For instance, the function 

CONSTRUCT_IMAGE (resolution, pixel_depth, encoding, colormap_size, 

colormap_depth, colormap, pixel_matrix) 

produces a (transient) value of type IMAGE that cannot be assigned to program variables, but 
can only be used in INSERT and UPDATE statements of the query language. It reads a number 
of input parameters (variables or results of other functions) and combines their values into a sin- 
gle value of type IMAGE. To illustrate this, we show how the CONSTRUCT_IMAGE function 
could be used in the query language SQL [Ch76] (cf. relation schema type 3 above): 

UPDATE IMAGE-OBJECT 

SET I-IMAGE = CONSTRUCT_IMAGE ($resolution, $depth, RGB_REAL_32, 256, ... ) 

WHERE I-ID= 1234; 

INSERT (4567, CONSTRUCT.IMAGE ($resolution, 24, IHS_INT_8, 0, ... )) 

INTO IMAGE-OBJECT; 

Identifiers with a leading dollar sign represent program variables, whereas parameters with capi- 
tal letters only indicate named constants. 

It should be clear at this point that this kind of command notation may be appropriate for 
the programmer, but not for the end-user. The interface for the latter should offer menus and 
icons to specify the source of the image to be stored. Even if only text input is possible, func- 
tions like READ_CAMERA (device_id) should be used in place of CONSTRUCT_IMAGE. The 
program that actually implements the user interface with the help of the query language could 
utilize this to avoid unnecessary copying of the large pixel matrix: It replaces the parameter vari- 
able in the intemal CONSTRUCT_IMAGE call by a function call that reads the camera input 
(usually part of the driver software that is delivered with a camera): 
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CONSTRUCTJMAGE ( RE AD_RGB .CAMERA ($camera_id), ... ) 

Avoiding intermediate storage and unnecessary copying is a very important design issue in mul- 
timedia databases. We shall return to this in section 5. 

Retrieving attribute values of type IMAGE from the database into program variables uses 
another set of functions like: 

GET.RESOLUnON (IMAGE attribute) : resolution.type; 

GET.DEPTH (EVlAGE attribute) : integer; 

GET.ENCODING (IMAGE attribute) : encoding.type; 

GET_8BrT_COLORMAP (IMAGE attribute) : array [0:255] of record ...; 
GET_8BIT_RASTER (IMAGE attribute) : array [0:n, 0:m] of 8bit_int; 
etc. 

Each function has a specific output type. Different functions can be defined to produce different 
output types for the same component of an IMAGE attribute. A query may look like this: 

SELECT GET_8BIT_RASTER (I-IMAGE), GET_8BIT_COLORMAP (I-IMAGE) 

INTO $rgb_screen, $rgb_colormap 
FROM IMAGE-OBJECT 
WHERE I-ID = 35; 

Instead of copying the image into program variables, it should again be possible to send it to an 
output device directly. To do so, the DBMS might be required to perform some transformation 
on the colormap and the pixel matrix, e.g. change the RGB encoding to IHS. It has not been 
decided yet how the syntax for that should look like. One could also think of many other access 
functions like GET. WINDOW, GET.ZOOMED.IMAGE, etc. [LH80, 11179]. The system is 
planned in a way that it is easy to add those functions when it seems appropriate. 



4.2.2. Description Data 

Some contents of an image can be represented by linking it to the objects it shows (fig. 2). 
That is not enough if we want to use the description data instead of the raw data whenever possi- 
ble, especially in search. It does not say anything about how the objects are shown on the image. 
For instance, a ship could be shown in a harbor, out on the sea, in a storm, or in a convoy. Nei- 
ther does it say anything about the relation or interaction of the objects shown on the same 
image. 

We have already pointed out why a text description seems to be most appropriate. In gen- 
eral, text can also cause some problems: it can be imprecise, it depends on the capabilities of the 
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author, and it can be ambiguous. In this context however, we need a special type of text tliat 
differs from the intended general multimedia data type TEXT in several aspects: 

- it is not self-content, but refers to other data 

- it has a very simple structure (one paragraph) 

- it is explicitly determined to support search. 

Therefore, we have decided to tie the image and its description together, that is, we enhance the 
IMAGE type with a description part. It consists of a set of phrases or sentences that characterize 
the contents of the image (fig. 3). The set notion implies that each phrase or sentence is indepen- 
dent of all the others, which necessarily leads to some repetitions (of nouns), but makes it much 
easier for the search mechanisms to grasp the meaning - or at least the important phrases. A typ- 
ical example would be: 

dog chases cat; 

cat is running from left to right; 
a house in the background; 
front door of house is open; 

This is still easy to enter and easy to read for human beings, but it also gives the system much 
more opportunity to distinguish images and to locate the ones that fit to a query. 

The description part can be empty, if an image is entered into the database that nobody has 
looked at yet. To add the description later, a function will be provided that takes an IMAGE 
value as input, expands the set of description phrases by the given new ones, and produces a new 
IMAGE value that can be assigned to an IMAGE attribute: 

UPDATE IMAGE-OBJECT 

SET I-IMAGE = ADD_DESCRIPTION (I-IMAGE, 

{ dog playing with cat, 
dog and cat chasing baU, 
dog runs from left to right, 
cat runs from right to left, 
ball is between dog and cat, 
baU bounces up in the air, 
dog and cat are in the backyard of a house ) ) 

WHERE I-ID= 1122; 

This suits particularly well to situations where someone examines an image and adds his or her 
observations to those that others have entered before, thereby sharing the new knowledge with 
all the users of the system. 
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In case the contents of an image are already known when it is stored, the two functions can 
be combined in the INSERT command: 

INSERT (4387, ADD_DESCRIPTION (CONSTRUCTJMAGE ( ... ), 

I USS Enterprise in the South Pacific on the way to ...))) 

INTO IMAGE-OBJECT; 

Other functions can be defined to read the set of descriptions or to delete elements from it. 
The most important operator is the one used in search: CONTAINS (IMAGE attribute, tem- 
plate). The simplest form of a template is just a word, and CONTAINS yields true, if any of the 
phrases in the description contains that word. The template may be more complicated, contain- 
ing several words that must app»ear in an arbitrary or given order, or specifying "wild cards" for 
unknown parts of a word. Many access paths and indexing methods are available to support this 
kind of retrieval [Fa85, KW81, KSW79]. 

To give an example of how the description can be used in the retrieval of images, consider 
a relation schema type 2 with ships as objects. The following query tries to find out whether the 
database holds some photos of a sinking destroyer: 

SELECT SHIP.S-ID, GET_RESOLUTION (S-IMAGE), .... 

INTO $ship_id, $resolution, .... 

FROM SHIP, SHIP-IMAGE 

WHERE SHIP.CLASS = "destroyer" 

AND SHIP.S-ID = SHIP-IMAGE.S-ID 

AND CONTAINS (SHIP-IMAGE.S-IMAGE, "sinking"); 

As another example consider the image of a cat and a dog playing as given above, and a relation 
schema of type 3. If we want to find all the pictures that show a dog playing with a cat, and both 
are chasing after a ball, we can a query like the following: 

SELECT GET_RESOLUnON (I-EMAGE), ... 

INTO Sresolution, ... 

FROM IMAGE-OBJECT 

WHERE CONTAINS (I-IMAGE, 

"cat I play* I dog", 

"dog & chas* & ball", 

"cat & chas* & ball" ); 

The I symbol between two given words requests that both words appear in the same sentence, but 
in an arbitrary sequence. The & symbol means that between the two words there may be other 
words that are ignored in the selection process. The * symbol finally matches strings of arbitrary 
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length (not containing spaces), usually part of a single word like prefix or suffix. Hence, if the 
sentence "a black cat plays with a brown dog running in the backyard” is used to describe the 
contents of a picture, this satisfies the first search pattern in the SELECT query, and so does the 
plirase "dog playing with cat” entered in the example UPDATE operation above. 

The details of the syntax for search patterns or templates are stUl to be determined. It is 
desirable to have Boolean operations. For instance, the given example assumes that a logical 
conjunction (and) between the the three search patterns. That means, each search pattern must 
be satisfied by at least one of the phrases, for the image to be selected. Naturally, one would also 
like to specify a disjunction (or), a negation (not), or combinations of all. 

Apart from registration data and description data there are some other issues about images 
that a multimedia DBMS should support, e.g. the management of subimages [TaSOa, Gr84]. The 
relation schema type 3 with the coordinates in the IS-SHOWN-ON relation already provides a 
way to define subimages, but it is rather cumbersome to extract them for display 
(GET_WINDOW function). Instead we want to access a subimage as easily as the full image - 
without storing the pixels redundantly. There are several ways to do this, and we have not yet 
decided about it. However, it seems clear that the system has to support two different concepts: 
First, subimages can be derived from other images by selecting rectangular subsections. Second, 
several images can be combined to form a larger image. The latter is particularly useful for 
Landsat photographs. Ideally, both concepts can be handled by the same mechanism. We plan 
to further extend the data type IMAGE by some kind of reference to other images. 



5. Architecture of a Prototype 

The prototype is intended to cop>e only with the management of images, including storage 
organization, query and browsing facilities, and presentation issues. To keep the effort limited 
and to make the functionality of the envisioned system available as soon as possible, it is being 
built around an existing relational DBMS (Ingres [RTI85, RTI87]). That implies that perfor- 
mance will not be an issue in the first version. The high-level architecture is shown in fig. 4. 

The dialogue manager can be regarded as the main program. It calls the device manager to 
perform the exchange of the data with the user, employuig a variety of input/output devices. It 
also calls the DBMS interface to store and retrieve the data, and maintains the state of the dia- 
logue with the user. The device manager is to hide the specific details of the different I/O dev- 
ices (cameras, monitors, VCRs) and to provide the dialogue manager with a more abstract view 
on their capabilities (comparable to the HIOMM in [WLK87]). The DBMS interface imple- 
ments the query language sketched in section 4.2. It gives the dialogue manager (and other appli- 
cations) the illusion of using a DBMS with integrated image management facilities. In fact it 
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Figure 4 : Architecture of the Prototype 

engages two different systems, a standard relational DBMS for the structured data and a picture 
manager. The picture manager is responsible for storing all the images in standard files. Each 
image will be given a unique identifier (e.g. the file name) that is used by the relational database 
to refer to the image. 

All the interfaces must be defined in detail. To do so, we have to investigate the different 
ways to encode images and the transformations required during input (capture) and output 
(display). How does the actual signal read from a camera through a video board look like? What 
has to be sent to the various types of monitors? And what is a suitable standard format that can 
cover both and thus avoid redundant storage of pictures? 

Implementing the interfaces should take into account that we have to avoid copying the 
whole picture whenever possible. A good idea might be to pipe the data read from the database 
through the transformation process into the monitor driver. To do this the dialogue manager 
should be written in the style of functional programming: 
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DISPLAY (TRANSFORM (DB-ACCESS (attribute, 

db-access-parameters) 

transformation-parameters) 

display-parameters) 

The analogous solution can be used for data capture. This gives the implementation the freedom 
not to copy the data but to hand over pointers instead. The only main memory copy of an image 
wUl then probably reside in the DBMS interface. 



6. Outlook and Future Work 

Design and implementation along the presented line have begun. A simple version of the 
prototype handling images is expected to be operational at the end of 1988. There are four 
major areas of continuing development; 

- investigation in the various search issues on description data 

- other attribute domains like text and sound 

- integration with an object-oriented data model 

- user interfaces and applications 

The management of the description data is a central issue in our proposal, and syntax and seman- 
tics of the search expressions as well as the internal organization (indexing) have to be designed 
carefully. To support the full range of multimedia applications, the database system must offer 
data types like text and sound as well. They will be included in the query language in a way 
similar to that of IMAGE. However, their access functions will be different. Especially the 
proper treatment of sound relies on some real-time features of the DBMS: When a recorded 
sound sequence is to be heard through a speaker, the DBMS must deliver the data fast enough to 
guarantee uninterrupted and timely replay. At the same time, the volume of data increases drasti- 
cally. Investigations about this special data type are about to begin. 

Once all the new data types and their access functions have been tested thoroughly, it is 
time to think about their integration into an object-oriented data model. This should be much 
easier than with the relational model. It should also be easier to design and implement new appli- 
cations using the object-oriented DBMS. The user interface can be enhanced with sophisticated 
query and browsing facilities. In addition to that applications can be built to provide higher-level 
objects like documents or hypertext to the user - including the operations to access and manipu- 
late them. Enough experience should be available then to discuss new storage methods and dev- 
ices, such as optical disk, that may be more appropriate for multimedia data than the standard 
magnetic disk, for their integration into the new DBMS. 
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